Arabic OCR Segmented - based System Hassanin

نویسندگان

  • Hassanin M. Al-Barhamtoshy
  • Mohsen A. Rashwan
چکیده

A new investigation in the Arabic OCR system has presented for the offline recognition of machineprinted cursive words. Therefore, a reliable transformation mechanism will be used to transform image text into free text (ASCII or Unicode Texts), that can be directly searched by a computer. Therefore, traditional preprocessing model (segmentation phase) will be included to extract each word from image text and divide it into segments. Then, recognition phase will take place, to find the most likelihoods of each possible text/character class given the segments. Accordingly, many classifiers can be used such as neural networks, Naïve Bayes, HMM classifiers. Such likelihoods are used to feed special algorithm as input in such ways to recognize the entire word. The whole process of the proposed framework includes three main stages: preparation, training, and testing. The data preparation aims at scanning, data image selection, alignment, identify text regions, and separate non text or image regions. Second, the training stage takes place, to extract features and build up the related language model; such features will be used in the third stage. Accordingly, at the first stage the paper focuses on the techniques used for font sizing, binarization, skewing, clearing (denoising), and segmentation before recognition takes place. [Hassanin M. Al-Barhamtoshy and Mohsen A. Rashwan. Arabic OCR Segmented-based System. Life Sci J 2014;11(10):1273-1283]. (ISSN:1097-8135). http://www.lifesciencesite.com. 200

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic & Urdu Text Segmentation Challenges & Techniques

Text Segmentation is one of the critical and vital step in OCR system of any language because accuracy of OCR depends upon correctly segmented characters. Segmentation divide the text images into its constituent parts (i.e. lines, components or words and individual characters). As Urdu and Arabic are highly cursive and context sensitive in nature and have improper space between words therefore,...

متن کامل

A Novel Approach for Word Segmentation in Correlation based OCR System

This paper introduces a novel approach for word segmentation in OCR system. Segmentation is one of the substantial subprocesses of the OCR system. The meaning of the word can be changed if segmented word is not correct. An approach of segmentation is formulated in which textual area of image is crimped as one large window .Then large window is divided into small windows of different lines and w...

متن کامل

Speak Correct: A Computer Aided Pronunciation Training System for Native Arabic Learners of English

In this paper we introduce the SpeakCorrect system which is a Computer Aided Pronunciation Training (CAPT) system for native Arabic students of English. The system is designed with optimized performance for the target users group. It is L1 dependent system and only the frequent pronunciation errors of native Arabic speakers are examined. Several adaptation techniques such as Speaker Adaptive Tr...

متن کامل

A Real-time DSP-Based Optical Character Recognition System for Isolated Arabic characters using the TI TMS320C6416T

Optical Character Recognition (OCR) is an area of research that has attracted the interest of researchers for the past forty years. Although the subject has been the center topic for many researchers for years, it remains one of the most challenging and exciting areas in pattern recognition. Since Arabic is one of the most widely used languages in the world, the demand for a robust OCR for this...

متن کامل

Large Vocabulary Arabic Online Handwriting Recognition System

Online handwriting recognition of Arabic script is a difficult problem since it is naturally both cursive and unconstrained. The analysis of Arabic script is further complicated due to obligatory dots/stokes that are placed above or below most letters and usually written delayed in order. In addition, Arabic language is rich in morphology and syntax which makes it a must for a good online handw...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014